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1.  INTRODUCTION 


Savin  [6]  and  Berndt  and  Savin  [2]  have  shown  Chat  an  inequality  relation 
exists  between  different  test  statistics  used  for  testing  hypotheses  of  the 
form  r  -  RfJ  ■  0  .  They  found  that  the  value  of  the  likelihood  ratio  test 
statistic  (LR  ■  -21ogX)  ,  the  Wald  test  statistic  (W),  and  the  Lagrange 
multiplier  test  statistic  (LM)  are  always  such  that 


(1) 


W  >  LR  >  LM 


This  result  has  been  generalized  by  Breusch'  [2]  who  showed  that  the 
only  necessary  assumption  for  this  inequality  to  hold  is,  that  the 
disturbances  follow  a  distribution  which  allows  maximum-likelihood  estimation. 
However, neither  Breusch  nor  any  of  the  authors  before  him  were  able  to 
conclude  anything  about  the  power  of  the  different  tests.  In  this  paper  it 
will  be  shown  that  for  finite  but  large  samples  a  similar  inequality  relation 
to  (1)  exists  between  the  powers  of  the  three  tests.  The  Wald  test  is 
uniformly  more  powerful  than  either  of  the  other  two  tests,  and  the 
likelihood  ratio  test  is  more  powerful  than  the  Lagrange  multiplier  test 
for  very  large  samples  and  for  moderate-to-large  differences  between  the  null 


hypothesis  and  the  true  value  of  the  tested  parameters. 


V 
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The  assumption  of  a  scalar  covariance  matrix  is  made  to  simplify  the 
exposition.  The  results  can  probably  be  generalized  to  hold  for  any 
disturbance  vector  which  allows  maximum-likelihood  estimation.. 


2.  THE  MODEL 

Consider  the  model: 

(2)  Y  -  xp+e 

(3)  e  ~  N(0,o2  I) 

Let  P  and  oz  be  the  maximum-likelihood  estimators  obtained  by 

unconstrained  maximization  of  the  likelihood  function,  and  P  and  O2  the 
corresponding  constrained  estimators.  Furthermore, we  shall  need  an 

A 

estimator  for  the  Lagrange  multiplier  (H)  and  the  ratio  of  the  constrained 
to  unconstrained  maxima  of  the  likelihood  function  (X.) .  The  three  test 
statistics  can  be  written  as: 

(4)  LR  -  -21ogA 

(5)  W  -  n  ^(r-RpyCRa'xr1^)*1  (r-RibJyV 

(6)  IH  -  nfjiWX'X) 


X'X^  R'  .  From  the  first-order 
conditions  for  maximizing  the  likelihood  function  subject  to  the  constraint 

A 

we  can  obtain  an  expression  for  u  : 


To  simplify  the  notation  let  A  ■  R 


(i 


(7) 


(rOCX)"1*)'1  (r-Rp) 


(7) 

-  (R(X-X)  R (r-RP) 

We  can 

now  rewrite  (5)  and  (6)  as 

(8) 

W 

-  n  £(r-Rp)'  A  1  (r-RP)J  j a1 

(9) 

LM 

-  n^r-RP)’  A-1(r-Rp)]/o2 

If  the  null  hypothesis  Is  true  and  the  disturbances  are  normally 
distributed,  we  have 

(10)  <*/n(r-RP)~  N(O,02A) 

Furthermore,  a2  is  a  consistent  estimator  for  o2  if  the  null  hypothesis 

Aa 

holds,  o  Is  a  consistent  estimator  regardless  of  whether  the  null  hypothesis 
is  true  or  not. 

It  follows  that  under  the  null  hypothesis,  W  and  LM  converge  to  the 
same  limiting  Chi-square  distribution  with  k  degrees  of  freedom,  where  k 
is  the  rank  of  R  .  Independently  it  can  be  shown  that  LR  also  converges  to 
the  same  Chi-square  distribution.  Combined  with  the  Inequality  relation  (1) 
this  gives  rise  to  possibly  conflicting  test  results.1 

3.  APPROXIMATE  RELATIVE  POWER  OF  THE  THREE  TESTS 

It  is  possible  to  evaluate  the  power  functions  in  the  case  of  finite  but 
large  samples.1  When  the  null  hypothesis  is  not  true, we  have: 

(11)  r-RP  ■  a 


therefore. 
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(12)  >/n(r-RP)~  N{vKa,  olA} 

For  large  samples  we  have  approximately 

(13)  >/n(r-RP)  /  a  X  N(Vna/0,A) 

If  a  Is  different  from  zero  and  n  Is  large,  the  quadratic  form  W 
(see  8)  follows  approximately  a  noncentral  Chi-square  distribution3  with  k 
degrees  of  freedom  and  non-centrality  parameter 

(14)  c(n)  -  fCa/o]'A_1  [a/0]  4 

Let  f(W)  be  the  density  function  of  the  Wald  test  statistic  and  under  the 
alternative  hypothesis  and  X*(0,k}  the  critical  value  based  on  a  central  Chi- 
square  distribution  with  k  degrees  of  freedom.  The  power  of  the  Wald  test  is  then 

(15)  P(W)  -  f  f (W)dW 

%(0,k} 

For  large  samples  we  can  approximate  the  density  function  f(W)  by  a 
non-central  Chi-square  density  function  with  non-centrality  parameter  c(n). 

As  n  increases,  c(n)  grows  without  limit,  and  the  distribution  of  W 
explodes.  From  (15)  we  see  that  the  asymptotic  power  of  the  Wald  test  is 

i 

therefore  equal  to  one.  A  similar  expression  can  be  constructed  for  the 
likelihood  ratio  test  (21)  and  the  Lagrange  multiplier  test  (18).  Their 
asymptotic  power  is  also  equal  to  one. 

This  is  a  common  feature  of  all  point  hypotheses  tests  based  on  a 
consistent  parameter  estimate.  It  is,  though  comforting  to  the  theoretician. 
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of  little  help  to  the  practitioner  trying  to  decide  which  test  to  use.  It 
says,  if  anything,  that  if  we  had  a  truly  infinite  sample,  it  would  not 
matter  which  test  we  applied,  as  long  as  it  was  based  on  a  consistent 
parameter  estimate.5  It  definitely  does  not  imply  that  the  tests  are  of 
equal  power  for  finite  but  large  samples,  which  is,  alas,  the  best  we  can 
do  in  the  real  world.  From  (8)  and  (9)  we  can  obtain  an  expression  for  LM  . 


(16)  U1  -  (0*  /  o*)  W 


And 

(17) 


f  (LM) 


Given  our  definition  of  "large  but  finite  samples,"  we  can  set 
o*/  o*  ■  o*  /plim  o1  *  p  <  1  ,  It  follows  that  the  power  of  the  Lagrange 
multiplier  test  is  equal  to 

(18)  P(IM)  -  f  f  (p  •  W)  •  dpW 

JX*(0,k) 


f  (W)  •  dW 

P‘l  *  X^0,k} 


Given  that  p*1  >  1  if  the  null  hypothesis  is  not  true,  we  can  conclude 
from  (15)  and  (18)  that  P(LM)  <  P(W)  . 

A  similar  argument  can  be  made  for  the  relative  power  of  the  Wald  and 
likelihood  ratio  tests.  Consider  0*  -  (Y-xjj)'(Y-xP)/  n  and 
P  ■  P  +  n(X'X)’1  A"1  (r-Rp)  .  Than,  letting  a  ■  y-xp  ,we  get  by  substitution: 


A 
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o2  -  l/n{e'e  -  2[X(l/n  X'X^A"1  ]'  (r-R(J) 

+  (r-RP)'A-1  n[R(X  'X)-1  X'X(X  'X)-1  R']A_1  (r-RP)} 
«  O2  +  (r-Rp)'  A-1  (r-Rp) 

We  can, therefore, express  (8)  as 

(19)  W  -  n(o2-62)/o2  -  n(o2  /  o2  -  1) 

Making  use  of  the  fact  that  A  ■  (o2/o2)  n^2  ,  we  can  write 

(22)  W  -  n(X*2/n-l) 

LR  -  -21og  A  -  n-logt  (W/n) +1)] 

The  power  of  the  likelihood  ratio  test  is 

(21)  P(LR)  -  |  f (LR) *dLR 

Xj{0,k} 

-  f  f  [n*log(W/n+ 1)]  *d[n*log(W/n  + 1)  ] 

X*(0,k} 

{  f (W)dW 

n[-l+  exp(x2{0,k}/n)] 

Consider  the  lower  limit  of  integration  in  (21): 

(22)  n[-l  +  exp(x*{0,k}/n)]  -  n£-l  +  ^(x* /n)1/^ 

-  n^/n  +  ^(xj/n)1/!^ 

-  Xj  +  o(n-1) 


From  (22),  (21)  and  (15)  it  follows  that  P(LR)  <  P(W)  .  Also  note 
from  (22)  that  the  difference  between  P(LR)  and  P(W)  decreases,  as  the 
sample  size  Increases. 

Consider  the  relative  power  of  the  LR  and  LM  tests.  From  (21)  and 
(18)  we  see  that  the  only  difference  in  power  between  the  two  tests  must 
come  from  the  difference  in  the  lower  limit  of  integration.  For  the 
likelihood  ratio  test  this  limit  is: 

(23)  X*  +  n  Z&’/n)1/!!  -  X*  fl  +  2  (X*  AoVu+l) ! 

i-2  L  i-1 

the  corresponding  limit  for  the  Lagrange  multiplier  test  (see  18)  is: 

(24)  (plim  o2/o2  )%*  -  £l  +  a'  A-1  a/ G*jx* 

Where  we  have  made  use  of  the  fact  that  O2  ■  02  +  (r-RP)'  A-1  (r-RP)  and 

A 

plim  (r-RP)  ■  a  .  For  a  given  sample  size  there  exists  an  a  ,  say  a®  ,  for 
which  the  Lagrange  multiplier  test  and  the  likelihood  ratio  test  are  equally 
powerful,  i.e.,  (23)  »(24). 

(25)  X^l  +  ^ttjAoVd+DlJ  -  xJ[l  +  a#'A“la®/o2] 

OO 

i-1(za/n)1/(i+1)!  ”  •,,A-1a#/o2 

From  (25)  we  see  that  the  lower  limit  of  integration  in  (18)  becomes 
relatively  larger  than  the  corresponding  limit  in  (21)  when  a  increases 
above  a®  .  This  means  that  the  likelihood  ratio  test  becomes  more  powerful. 


as  the  difference  between  the  null  hypothesis  and  the  true  value  of  the 
tested  parameter  grows.  In  other  words,  the  power  function  of  the 
likelihood  ratio  test  must  intersect  the  power  function  of  the  Lagrange 
multiplier  test  from  below,  as  depicted  in  Figure  1. 

This  result,  however,  is  only  an  outflow  of  our  incongruous  assumption 
about  the  sample  size.  Note  that  the  left-hand  side  of  (25)  is  of  order  1/n  , 
so  that  for  very  large  samples  it  must  be  practically  zero.  This  implies  that 
a°  must  be  practically  zero  as  well.  In  the  limit  it  must  be  exactly  zero. 

We  can  also  consider  (25)  under  a  somewhat  different  aspect.  Holding 
a°  constant  we  can  determine  the  effect  of  an  increase  in  the  sample  size. 

If  n  Is  large  enough  so  that  [X'X]n-1  has  already  attained  it's  limit,  or 
is  close  to  it,  an  increase  in  n  would  not  have  any  noticeable  effect  on 
the  right-hand  side  of  (25).  The  left-hand  side  would  become  smaller  however, 
leading  us  to  conclude,  that  the  likelihood  ratio  test  would  become  more 
powerful . 

4.  RELAXING  THE  ASSUMPTIONS  ABOUT  SAMPLE  SIZE 

The  comparison  of  the  relative  power  of  the  Lagrange  multiplier  test 
and  the  Wald  test  in  (18)  depended  heavily  on  our  convenient  definition  of 
large  but  finite  samples.  But  it  is  not  certain  that  there  exists  a  sample 
size  for  which  the  conditions  of  our  definition  hold.  The  difference  between 
o  and  o*  is  of  order  1/n  and  e(n)  is  of  order  n  ,  so  that  strictly 
speaking  a  approaches  0*  at  the  same  speed  as  c(n)  approaches  infinity. 
If  we  are  prepared  to  set  0*«  O*  for  a  very  large  n  ,  which  is  what  we  are 
doing  when  we  use  the  asymptotic  distribution  to  test  hypotheses  based  on 
finite  samples,  we  should  apply  the  same  standards  and  set  c(n)  equal  to 
infinity.  Under  these  circumstances  all  the  tests  are  of  power  one. 


P  (LR) 
P(LM) 
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In  order  to  be  able  to  say  something  about  the  relative  power  of  the 

Lagrange  multiplier  test  and  the  Wald  test  for  finite  samples  we  must  treat 

oz  and  a  as  stochastic.  We  can  define  a  critical  value  d  >  0  of  the 

Y 

random  variable  a2 /a2  such  that 


(26)  Pr{a2/a2  >  l+dy }  -  y 

Then  we  can  rewrite  (18)  as 


As 


(28) 


Y 


approaches  unity, 


approaches  zero  so  that  we  have 


1 


or  PCLM)  <  P(W)  . 

This  comparison  of  P(LM)  and  P(W)  no  longer  relies  on  any  special 
assumptions  about  the  sample  size.  It  applies  to  any  sample  size  for  which 
the  Chi-square  distribution  adequately  describes  the  distribution  of  W  , 
i.e.,  a  Chi-square  is  justified. 

Qualitatively  our  results  have  changed  little.  Because  o*  and  a2 
are  stochastic, we  have  to  allow  for  the  possibility  that  they  may  be  equal 
by  a  fluke,  even  if  the  null  hypothesis  is  not  true.  In  this  case  all  three 
test  statistics  take  on  the  value  zero  and  commit  a  type  two  error.  This  is 
the  main  reason  why  the  strict  inequality  derived  from  (18)  has  to  be  changed 
to  4  *  However, with  any  probability  of  less  than  one,  dy  in  (27)  is  >  0 
and  the  strict  inequality  still  holds. 


5.  A  NUMERICAL  EXAMPLE 


A  snail  Monte-Carlo  experiment  was  conducted  to  determine  the  extent 
to  which  the  power  of  the  different  tests  varies.  The  model  was  simply 
Y  ■  a+e  with  e  distributed  normally  and  independently  with  mean  zero  and 
variance  one.  The  null  hypothesis  was  that  a  *  0  .  Table  I  summarizes 
the  results  for  the  tests  using  100  samples  of  size  100  each. 

The  results  are  not  very  surprising.  Indeed,  the  Wald  test  is  more 
powerful  than  the  other  two  tests.  As  a  increases,  the  power  of  all  three  tests 
goes  to  one.  The  relative  advantage  of  the  Wald  test  is  strongest  for 
small  a ’ s . 


6.  CONCLUSIONS 

It  is  true  that  both  the  likelihood  ratio  test  and  the  Lagrange 
multiplier  test  have  asymptotically  the  same  power  as  the  Wald  test.  For 
the  likelihood  ratio  test  this  is  primarily  due  to  the  fact  that  the  difference 
in  power  between  the  two  tests  vanishes,  as  n  goes  to  infinity  (see  (21)). 
However,  the  Lagrange  multiplier  test  has  asymptotically  the  same  power  as  the 
Wald  test, only  due  to  the  fact  that  the  limiting  distribution  of  both  tests 
is  not  defined.  The  difference  between  the  lower  limits  of  integration  in  (18) 
and  (15)  converges  to  a  number  different  from  zero.  Loosely  speaking, one 
could  state  that  if  the  limiting  distribution  of  W  under  the  alternative 
hypothesis  was  defined,  the  likelihood  ratio  test  would  still  be  asymptotically 
of  equal  power  as  the  Wald  test,  while  the  Lagrange  multiplier  test  would  be 
uniformly  less  powerful. 
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TABLE  I 

NUMBER  OF  REJECTIONS  PER  HUNDRED 


True 

Parameter 

Value 

Wald  Test 

LR  Test 

LM  Test 

a  ■  .2 

25 

21 

20 

a  ■  .3 

51 

50 

48 

a*  .4 

88 

86 

85 

a  ■  .5 

98 

98 

98 

a*  1 

100 

100 

100 

Significance  level:  5% 


It  is  trivial  to  note  that  two  tests  have  the  same  asymptotic  power,  if 
this  statement  depends  on  the  fact  that  the  limiting  distribution  explodes. 

We  must  instead  consider  the  relative  power  for  sample  sizes  short  of  infinity. 
In  this  paper  it  was  shown  that  there  exists  an  inequality  relation  similar 
to  (1)  between  the  power  of  the  three  tests  for  large  but  finite  samples. 

The  Wald  test  is  uniformly  more  powerful  than  either  of  the  other  two  tests 
if  we  accept  the  stated  definition  of  "large  but  finite"  samples.  If  we 
are  not  able  to  set  a 2  -  o2  and  02»  pllm  o2  for  a  given  sample  size,  the 
strict  Inequality  relation  has  to  be  modified  to  P(LM)  ^  P(W)  .  However, 

P(LR)  <  P(W)  apparently  still  holds,  because  this  comparison  does  not  depend 
on  setting  o2  ■  o2  . 

It  follows  that  for  finite  samples  the  Wald  test  is  at  least  as  powerful 
or  more  powerful  than  the  other  two  tests.  If  it  is  no  more  difficult  or 
costly  to  compute,  then  there  appears  to  be  little  justification  for  using 
either  the  Lagrange  multiplier  test  or  the  likelihood  ratio  test  for  the 
purpose  of  testing  linear  restrictions  in  a  linear  regression  model.6 

We  could  view  the  problem  as  one  of  misspecification  of  the  critical 
region.  If  the  different  tests  are  proportional  to  each  other  (e.g., 

LM  ■  [(o2/o2 )*W]  (see  expression  (16)),  they  should  not  have  the  same 
critical  values.  If  we  define  the  critical  value  of  LM  test  as  (o2/o* ) 
times  the  critical  value  of  the  W  test,  the  two  tests  are, of  course, equally 
powerful  for  all  sample  sizes.  A  similar  argument  applies  for  the  critical 


value  of  the  LR  test. 
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FOOTNOTES 


1  It  has  to  be  pointed  out,  that, of  course, no  conflict  arises  If  the 
Wald  test  accepts  the  null  hypothesis  or  the  Lagrange  multiplier  test 
rejects  it. 

* "Finite  but  large"  samples  are  small  enough  so  that  1/n  Is  still 
different  from  zero,  but  the  limiting  distribution  is  an  adequate  apprpxl- 
mation  to  the  sampling  distribution.  This  is  the  way  in  which  we  always  use 
limiting  results  in  practice. 

3  For  small  samples,  the  appropriate  statistic  to  use  would,  of  course,  be 
F  *  (o2 /  ks* ) *W  ,  where  s2  is  the  (unbiased)  least  squares  estimator  for  o2 

4  See  Kendall  and  Stuart  [5],  p.  237-41. 

’Under  these  circumstances, hypothesis  testing  becomes  trivial,  because 
in  the  limit  there  are  no  'unknown  parameters'. 

*  There  exist  situations  when  the  unrestricted  model  cannot  be  estimated, 
but  we  nevertheless  wish  to  test  linear  restrictions  (e.g.,  identifying 
restrictions).  Under  these  circumstances,  the  IM  test  is  the  only  feasible 
procedure. 
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